random rotation
Understanding Adam Requires Better Rotation Dependent Assumptions
Despite its widespread adoption, Adam's advantage over Stochastic Gradient Descent (SGD) lacks a comprehensive theoretical explanation. This paper investigates Adam's sensitivity to rotations of the parameter space. We observe that Adam's performance in training transformers degrades under random rotations of the parameter space, indicating a crucial sensitivity to the choice of basis in practice. This reveals that conventional rotation-invariant assumptions are insufficient to capture Adam's advantages theoretically. To better understand the rotation-dependent properties that benefit Adam, we also identify structured rotations that preserve or even enhance its empirical performance. We then examine the rotation-dependent assumptions in the literature and find that they fall short in explaining Adam's behaviour across various rotation types. In contrast, we verify the orthogonality of the update as a promising indicator of Adam's basis sensitivity, suggesting it may be the key quantity for developing rotation-dependent theoretical frameworks that better explain its empirical success.
DRIVE: One-bit Distributed Mean Estimation
We consider the problem where nclients transmit d-dimensional real-valued vectors using dp1 `op1qqbits each, in a manner that allows the receiver to approximately reconstruct their mean. Such compression problems naturally arise in distributed and federated learning. We provide novel mathematical results and derive computationally efficient algorithms that are more accurate than previous compression techniques. We evaluate our methods on a collection of distributed and federated learning tasks, using a variety of datasets, and show a consistent improvement over the state of the art.
TABASCO: A Fast, Simplified Model for Molecular Generation with Improved Physical Quality
Vonessen, Carlos, Harris, Charles, Cretu, Miruna, Liรฒ, Pietro
State-of-the-art models for 3D molecular generation are based on significant inductive biases, SE(3), permutation equivariance to respect symmetry and graph message-passing networks to capture local chemistry, yet the generated molecules still struggle with physical plausibility. We introduce TABASCO which relaxes these assumptions: The model has a standard non-equivariant transformer architecture, treats atoms in a molecule as sequences and reconstructs bonds deterministically after generation. The absence of equivariant layers and message passing allows us to significantly simplify the model architecture and scale data throughput. On the GEOM-Drugs benchmark TABASCO achieves state-of-the-art PoseBusters validity and delivers inference roughly 10x faster than the strongest baseline, while exhibiting emergent rotational equivariance despite symmetry not being hard-coded. Our work offers a blueprint for training minimalist, high-throughput generative models suited to specialised tasks such as structure- and pharmacophore-based drug design. We provide a link to our implementation at github.com/carlosinator/tabasco.
WB_CameraReady.pdf
This document provides additional details, analysis, and experimental results. We begin by discussing the detailed experimental setup and implementation of the methods in Section A. Then, we provide additional empirical experiments against several other defense methods in Section B, and a discussion on the stealthiness of the backdoor images in the input space in Section C. Finally, we provide the supporting proofs for the claims in the main paper in Section D. A.1 Datasets As we described in the main paper, we use four datasets, MNIST, CIFAR10, GTSRB, and TinyImagenet, to evaluate our method. Note that MNIST, CIFAR10, and GTSRB have been widely used in the literature of backdoor attacks on DNN. On the other hand, the use of a more complex dataset, TinyImagenet, enables better evaluation for multiple-target backdoor attacks such as all-to-all, thanks to the diversity of images in TinyImagenet and its large number of classes. MNIST [28] is a subset of the larger dataset available from the National Institute of Technology.
Planning-Guided Diffusion Policy Learning for Generalizable Contact-Rich Bimanual Manipulation
Li, Xuanlin, Zhao, Tong, Zhu, Xinghao, Wang, Jiuguang, Pang, Tao, Fang, Kuan
Contact-rich bimanual manipulation involves precise coordination of two arms to change object states through strategically selected contacts and motions. Due to the inherent complexity of these tasks, acquiring sufficient demonstration data and training policies that generalize to unseen scenarios remain a largely unresolved challenge. Building on recent advances in planning through contacts, we introduce Generalizable Planning-Guided Diffusion Policy Learning (GLIDE), an approach that effectively learns to solve contact-rich bimanual manipulation tasks by leveraging model-based motion planners to generate demonstration data in high-fidelity physics simulation. Through efficient planning in randomized environments, our approach generates large-scale and high-quality synthetic motion trajectories for tasks involving diverse objects and transformations. We then train a task-conditioned diffusion policy via behavior cloning using these demonstrations. To tackle the sim-to-real gap, we propose a set of essential design options in feature extraction, task representation, action prediction, and data augmentation that enable learning robust prediction of smooth action sequences and generalization to unseen scenarios. Through experiments in both simulation and the real world, we demonstrate that our approach can enable a bimanual robotic system to effectively manipulate objects of diverse geometries, dimensions, and physical properties. Website: https://glide-manip.github.io/